vi ◾ Contents
1.5.8
Sequence Length Distribution
30
1.5.9
Sequence Duplication Levels
31
1.5.10 Overrepresented Sequences
31
1.5.11 Adapter Content
32
1.5.12 K-mer Content
33
1.6 PREPROCESSING OF THE FASTQ READS
34
1.7 SUMMARY
45
REFERENCES
46
Chapter 2 ◾ Mapping of Sequence Reads to the Reference Genomes
49
2.1 INTRODUCTION TO SEQUENCE MAPPING
49
2.2 READ MAPPING
55
2.2.1
Trie
56
2.2.2
Suffix Tree
56
2.2.3
Suffix Arrays
57
2.2.4
Burrows–Wheeler Transform
58
2.2.5
FM-Index
62
2.3 READ SEQUENCE ALIGNMENT AND ALIGNERS
63
2.3.1
SAM and BAM File Formats
65
2.3.2
Read Aligners
70
2.3.2.1 Burrows–Wheeler Aligner
71
2.3.2.2 Bowtie2
75
2.3.2.3 STAR
76
2.4 MANIPULATING ALIGNMENTS IN SAM/BAM FILES
79
2.4.1
Samtools
79
2.4.1.1 SAM/BAM Format Conversion
79
2.4.1.2 Sorting Alignment
80
2.4.1.3 Indexing BAM File
80
2.4.1.4 Extracting Alignments of a Chromosome
81
2.4.1.5 Filtering and Counting Alignment in SAM/BAM Files
81
2.4.1.6 Removing Duplicate Reads
82
2.4.1.7 Descriptive Statistics
83
2.5 REFERENCE-GUIDED GENOME ASSEMBLY
83
2.6 SUMMARY
85
REFERENCES
86